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Using CUDA as programming language, we create a code named CuBA 
which is based on the CPU code "Boltzmann Approach for Many Parton 
Scattering (BAMPS)" developed in Frankfurt in order to study a system of 
many colliding particles resulting from heavy ion collisions. Furthermore, 
we benchmark our code with the Riemann Problem and compare the results 
with BAMPS. They demonstrate an improvement of the computational 
runtime, by one order of magnitude. 

PACS numbers: 11. 15. Ha; 12.38Gc; 12.38Mh 

1. Introduction 

Basing ourselves on the BAMPS code developed in Frankfurt by C.Greiner, 
Z.Xu et al., we decided to study the interaction between the gluons of a 
gluon gas produced at the onset of Heavy Ion Collisions pQ . We use CUDA 
as programming language to create the code CuBA "The Boltzmann Ap- 
proach for Many Parton Scattering written with CUDA" [2]. We expect to 
get an improvement of the computational runtime. In addition, both codes 
are benchmarked with the Riemann problem to compare the results of the 
two programs. 

In this paper we investigate the physical concepts behind this program, 
the CUDA language and finally the prior results obtained. 

* Presented by Ulrike Eilhauer at the International Meeting "Excited QCD", Peniche, 
Portugal, 6 - 12 May, 2012 
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2. Theory 

We are specially interested in solving the Riemann problem in viscous 
matter using the relativistic Boltzmann equation which is as follows, 



- + ^ • V r + F ■ V p i) fi = J d*pi(Pp x (Pp 2 5\P f - Pi) \T fi \ 2 {f' 2 f[ - hh 

(1) 

To get a good compromise between computational runtime and physi- 
cal accuracy, we used the application of microscopic theories together with 
strong assumptions like neglecting quantum mechanical effects. 

The main idea for solving the Boltzmann equation with the Particles-In- 
A-Cell-method (PIC) consists in dividing a certain volume into many cells 
with volume V ce ii = AxAyAz, where we have N particles, which will suffer 
movement- and collision-laws in a certain time interval At. Each particle 
will have its own position r and momentum p. So if the particle does not 
collide, its propagation is given by, 

x^x + v x At = x + c 2l ^At (2) 

E 

The same is valid for the y and z directions. 

On the other hand, it is important to consider that the collisions are 
binary and can only occur between particles in the same cell. Therefore, the 
probabilty of collisions to occur is given by using the Monte-Carlo method 
in At, 

P22 = v rel — — — (3) 

latest Vcell 

being a the total cross section, which is considered to be isotropic and 
v re i the relative velocity given by, v re i = 2eIe 2 w ^ ere s i s the Mandelstam 
variable, s = (p± + P2) 2 [I]- 

To reduce statistical fluctuations and to keep the accuracy of our pre- 
tended solution, we use the testparticle method. It consists in introducing 
Ntest = rtestN with rtest as a chosen factor, which increases the number 
of particles. To keep the mean free path A independent of Ntest we reduce 
the probaility P22 by the same r tes t- To get the direction of the outgoing 
momentum we boost from the plasma frame to the center of mass frame ap- 
plying the Lorentz transformation. In the center of mass frame we choose 
the momentum randomly. After that, we boost back to the orginial frame. 
If a particle collides with one of the six walls established by the box volume, 
it will be elastically reflected. 
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3. CUDA language 

Cuda is a language for parallel programming in gpus, which recently 
started being used in numerical computations in physics, due to the poten- 
tial performance increased by order of magnitude. 

The CUDA logic is built by writing kernel functions, which calculate the 
physical matters, in the device and calling them using the host. The device 
is constituted by various grids which include about 65535 3 blocks for Fermi 
arquitecures and 65535 2 blocks for older arquitectures. Each block has 256 
threads. The postion and momentum of each particle in At is stored in a 
thread. So we point out that the big advantage of using CUDA consists in 
the fast shared memory region that can be shared among threads [3] [1] . 

4. Flowchart 

Our code structure is presented in figure [TJ 



mam program 
Initial conditions and final results 



ceil.h 

Defines the parameters of the cell 



cell_gpu.cu 

calls :he kernel "un:"ion5 



reduction kernei.cu 



ceii kernel cti 
kernel functions 



ceii devke.cu 
subkernel functions 



Fig. 1. Flowchart of our CuBA code. 



5. Results 

To test our code we have to take into account the initial conditions 
we choose. The two important parameters are the time variation At and x 
variation Ax, once we consider a transverse homogeneous plan. At is always 
choosen to be smaller than Ax to avoid large local variations in one time 
step. If we increase Ax, we have to increase the testparticle number Ntest- 
The more testparticles we have, more the curve of the Riemann problem 
approximates to the theoretical solution. A small testparticle number affects 



4 



PROCEEDING PRINTED ON AUGUST 16, 2012 



the fluctuations. To simulate an ideal fluid we may choose a very small 
viscosity. 

First, we check some numerical solutions for CuBA considering various 
parameters. For starters we consider our box volume to be 32 3 fm 3 , the cross 
secion, a= 10 GeV~ 2 , dt=0.1 fm/c, with equal particle distribuition at the 
beginning and diferent temperatures on each side of the box, T/ e y t =0.4 GeV 
and T r i g ht=0.2 GeV. The conservation of the total energy is verified, just 
as it was expected. We observe the evolution in At in figure [2| 
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Fig. 2. Evolution of the energy density, shown for different time-slices At. The 
propagation of the two waves from the initial boundary of the Riemann problem 
is clearly visible. 

In addition, we observe in figure [2] the typical figure of the Riemann 
problem. This problem consists of a propagating shock wave because the 
initial conditions impose different temperatures [5]. 

Secondly, we range the cross section, considering the other variables 
constant and as previously refered. We observe the diferences in figure [3| 

As we can verify, the slope undoes itself by increasing the cross section, 
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Fig. 3. Evolution of the local cross section, shown for different time-slices. 

which physically means to have a larger viscosity. 

At last, to compare our results to the BAMPS code we choose the same 
initial conditions in both codes, which are the ones mentioned at the begin- 
ning of this section. 
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Fig. 4. Comparing the energy density of BAMPS (red points) with CuBA (blue 
points), both codes produce the same results, except for statistical fluctuations. 
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In figure [4] we can surely identify the overlapping of the results obtained 
with CuBA (blue points) and BAMPS (red points). 

While the BAMPS code spents 12 minutes and 36 seconds to calculate 
the data, CUBA just needs 58.09 seconds. 

6. Conclusions 

The resulting data can be used to confirm the CPU code and improve 
the study of shocking particles. For now we can say that CuBA is about 13 
times faster than BAMPS. 

In the near future we pretend to implement the parameter dt as variable 
and optimize our code in computational runtime. Furthermore we will check 
our code with other initial conditions and compare it to BAMPS. 

As final result we expect to obtain a code which is able to calculate any 
problem of this type and being as fast as cuda allows us. 
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